Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts

نویسندگان

  • Takashi Nose
  • Yusuke Arao
  • Takao Kobayashi
  • Komei Sugiura
  • Yoshinori Shiga
  • Akinori Ito
چکیده

This paper proposes a sentence selection method using a maximum entropy criterion to construct recording scripts for speech synthesis. In the conventional corpus design of speech synthesis, a greedy algorithm that maximizes phonetic coverage is often used. However, for statistical parametric speech synthesis, phonetic and prosodic contextual balance is important as well as the coverage. To take account of both of the phonetic and prosodic contextual balance in the sentence selection, we introduce and maximize the entropy of the phonetic and prosodic contexts, such as biphone, triphone, accent, and sentence length. The objective experimental results show that the proposed method achieves better coverage and balance of contexts and reduces spectral and F0 distortions compared to the random and coverage-based sentence selection methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unit Selection Speech Synthesis Using Phonetic-Prosodic Description of Speech Databases

This paper describes an approach to speech synthesis based on using speech databases at different stages of TTS process. Speech database units are phones in different segmental and prosodic contexts. Pitch synchronous segmentation and labeling of databases allows storing both segmental and prosodic information. Phonetic-prosodic annotations of speech databases are involved in off-line training ...

متن کامل

Corpus Creation for Polish Unit Selection Speech Synthesis

This paper describes the process of creating speech corpus for Polish Unit Selection speech synthesis. This task is time-consuming and manually designing the corpus is, in practice, only applicable in Limited Domain Speech Synthesis and Recognition. The sentence selection tools used while designing the corpus are usually based on the Greedy algorithm. The algorithm looks for sentences which cov...

متن کامل

On building phonetically and prosodically rich speech corpus for text-to-speech synthesis

This paper proposes a way of preparing and recording a speech corpus for unit selection text-to-speech speech synthesis driven by symbolic prosody. The research is focused on a phonetically and prosodically rich sentence selection algorithm. Symbolic description on a deep prosody level is used to enrich the phonetic representation of sentences (by respecting the prosodeme types phones appear in...

متن کامل

Creation and analysis of a Polish speech database for use in unit selection synthesis

The main aim of this study is to describe the process of creating a speech database to be used in corpus based text-to-speech synthesis. To help achieve natural sounding speech synthesis, the database construction was aimed at rich phonetic and prosodic coverage based on variable length units (phoneme, diphone, triphone) from different phonetic and prosodic contexts. Following previous work on ...

متن کامل

Design of a Mandarin Sentence Set for C by Use of a Multi-tier Algorithm Tak Prosodic and Spectral Ch

This paper presents a multi-tier algorithm to extract a sentence set from a large raw text corpus for synthesis of Mandarin speech, taking account of varied prosodic and spectral characteristics. The prosodic and spectral characteristics are statistically analyzed from the text corpus and transcribed as syllable-sized unit candidates in a multi-tier way. The unit candidates cover all of the syl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015